Just a short, simple blog for Bob to share his thoughts.
28 February 2014 • by Bob • URL Rewrite, SEO, Classic ASP
In December of 2012 I wrote a blog titled "Using Classic ASP and URL Rewrite for Dynamic SEO Functionality", in which I described how to use Classic ASP and URL Rewrite to dynamically-generate Robots.txt and Sitemap.xml or Sitemap.txt files. I received a bunch of requests for additional information, so I wrote a follow-up blog this past November titled "Revisiting My Classic ASP and URL Rewrite for Dynamic SEO Functionality Examples" which illustrated how to limit the output for the Robots.asp and Sitemap.asp files to specific sections of your website.
That being said, I continue to receive requests for additional ways to stretch those samples, so I thought that I would write at least a couple of blogs on the subject. With that in mind, for today I wanted to show how you can add additional content types to the samples.
Here is a common question that I have been asked by several people:
"The example only works with *.html files; how do I include my other files?"
That's a great question, and additional content types are really easy to implement, and the majority of the code from my original blog will remain unchanged. Here's the file by file breakdown for the changes that need made:
Filename | Changes |
---|---|
Robots.asp | None |
Sitemap.asp | See the sample later in this blog |
Web.config | None |
If you are already using the files from my original blog, no changes need to be made to your Robots.asp file or the URL Rewrite rules in your Web.config file because the updates in this blog will only impact the output from the Sitemap.asp file.
My original sample contained a line of code which read "If StrComp(strExt,"html",vbTextCompare)=0 Then" and this line was used to restrict the sitemap output to static *.html files. For this new sample I need to make two changes:
Note: I define the constant near the beginning of the file so it's easier for other people to find; I would normally define that constant elsewhere in the code.
<% Option Explicit On Error Resume Next Const strContentTypes = "htm|html|asp|aspx|txt" Response.Clear Response.Buffer = True Response.AddHeader "Connection", "Keep-Alive" Response.CacheControl = "public" Dim strFolderArray, lngFolderArray Dim strUrlRoot, strPhysicalRoot, strFormat Dim strUrlRelative, strExt Dim objFSO, objFolder, objFile strPhysicalRoot = Server.MapPath("/") Set objFSO = Server.CreateObject("Scripting.Filesystemobject") strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST") ' Check for XML or TXT format. If UCase(Trim(Request("format")))="XML" Then strFormat = "XML" Response.ContentType = "text/xml" Else strFormat = "TXT" Response.ContentType = "text/plain" End If ' Add the UTF-8 Byte Order Mark. Response.Write Chr(CByte("&hEF")) Response.Write Chr(CByte("&hBB")) Response.Write Chr(CByte("&hBF")) If strFormat = "XML" Then Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf End if ' Always output the root of the website. Call WriteUrl(strUrlRoot,Now,"weekly",strFormat) ' -------------------------------------------------- ' This following section contains the logic to parse ' the directory tree and return URLs based on the ' files that it locates. ' -------------------------------------------------- strFolderArray = GetFolderTree(strPhysicalRoot) For lngFolderArray = 1 to UBound(strFolderArray) strUrlRelative = Replace(Mid(strFolderArray(lngFolderArray),Len(strPhysicalRoot)+1),"\","/") Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative)) For Each objFile in objFolder.Files strExt = objFSO.GetExtensionName(objFile.Name) If InStr(1,strContentTypes,strExt,vbTextCompare) Then If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat) End If End If Next Next ' -------------------------------------------------- ' End of file system loop. ' -------------------------------------------------- If strFormat = "XML" Then Response.Write "</urlset>" End If Response.End ' ====================================================================== ' ' Outputs a sitemap URL to the client in XML or TXT format. ' ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never ' tmpStrFormat = TXT|XML ' ' ====================================================================== Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat) On Error Resume Next Dim tmpDate : tmpDate = CDate(tmpLastModified) ' Check if the request is for XML or TXT and return the appropriate syntax. If tmpStrFormat = "XML" Then Response.Write " <url>" & vbCrLf Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf Response.Write " </url>" & vbCrLf Else Response.Write tmpStrUrl & vbCrLf End If End Sub ' ====================================================================== ' ' Returns a string array of folders under a root path ' ' ====================================================================== Function GetFolderTree(strBaseFolder) Dim tmpFolderCount,tmpBaseCount Dim tmpFolders() Dim tmpFSO,tmpFolder,tmpSubFolder ' Define the initial values for the folder counters. tmpFolderCount = 1 tmpBaseCount = 0 ' Dimension an array to hold the folder names. ReDim tmpFolders(1) ' Store the root folder in the array. tmpFolders(tmpFolderCount) = strBaseFolder ' Create file system object. Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject") ' Loop while we still have folders to process. While tmpFolderCount <> tmpBaseCount ' Set up a folder object to a base folder. Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1)) ' Loop through the collection of subfolders for the base folder. For Each tmpSubFolder In tmpFolder.SubFolders ' Increment the folder count. tmpFolderCount = tmpFolderCount + 1 ' Increase the array size ReDim Preserve tmpFolders(tmpFolderCount) ' Store the folder name in the array. tmpFolders(tmpFolderCount) = tmpSubFolder.Path Next ' Increment the base folder counter. tmpBaseCount = tmpBaseCount + 1 Wend GetFolderTree = tmpFolders End Function %>
That's it. Pretty easy, eh?
I have also received several requests about creating a sitemap which contains URLs with query strings, but I'll cover that scenario in a later blog.
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
20 November 2013 • by Bob • Classic ASP, IIS, SEO, URL Rewrite
Last year I wrote a blog titled Using Classic ASP and URL Rewrite for Dynamic SEO Functionality, in which I described how you could combine Classic ASP and the URL Rewrite module for IIS to dynamically create Robots.txt and Sitemap.xml files for your website, thereby helping with your Search Engine Optimization (SEO) results. A few weeks ago I had a follow-up question which I thought was worth answering in a blog post.
Here is the question that I was asked:
"What if I don't want to include all dynamic pages in sitemap.xml but only a select few or some in certain directories because I don't want bots to crawl all of them. What can I do?"
That's a great question, and it wasn't tremendously difficult for me to update my original code samples to address this request. First of all, the majority of the code from my last blog will remain unchanged - here's the file by file breakdown for the changes that need made:
Filename | Changes |
---|---|
Robots.asp | None |
Sitemap.asp | See the sample later in this blog |
Web.config | None |
So if you are already using the files from my original blog, no changes need to be made to your Robot.asp file or the URL Rewrite rules in your Web.config file because the question only concerns the files that are returned in the the output for Sitemap.xml.
The good news it, I wrote most of the heavy duty code in my last blog - there were only a few changes that needed to made in order to accommodate the requested functionality. The main difference is that the original Sitemap.asp file used to have a section that recursively parsed the entire website and listed all of the files in the website, whereas this new version moves that section of code into a separate function to which you pass the unique folder name to parse recursively. This allows you to specify only those folders within your website that you want in the resultant sitemap output.
With that being said, here's the new code for the Sitemap.asp file:
<% Option Explicit On Error Resume Next Response.Clear Response.Buffer = True Response.AddHeader "Connection", "Keep-Alive" Response.CacheControl = "public" Dim strUrlRoot, strPhysicalRoot, strFormat Dim objFSO, objFolder, objFile strPhysicalRoot = Server.MapPath("/") Set objFSO = Server.CreateObject("Scripting.Filesystemobject") strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST") ' Check for XML or TXT format. If UCase(Trim(Request("format")))="XML" Then strFormat = "XML" Response.ContentType = "text/xml" Else strFormat = "TXT" Response.ContentType = "text/plain" End If ' Add the UTF-8 Byte Order Mark. Response.Write Chr(CByte("&hEF")) Response.Write Chr(CByte("&hBB")) Response.Write Chr(CByte("&hBF")) If strFormat = "XML" Then Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf End if ' Always output the root of the website. Call WriteUrl(strUrlRoot,Now,"weekly",strFormat) ' Output only specific folders. Call ParseFolder("/marketing") Call ParseFolder("/sales") Call ParseFolder("/hr/jobs") ' -------------------------------------------------- ' End of file system loop. ' -------------------------------------------------- If strFormat = "XML" Then Response.Write "</urlset>" End If Response.End ' ====================================================================== ' ' Recursively walks a folder path and return URLs based on the ' static *.html files that it locates. ' ' strRootFolder = The base path for recursion ' ' ====================================================================== Sub ParseFolder(strParentFolder) On Error Resume Next Dim strChildFolders, lngChildFolders Dim strUrlRelative, strExt ' Get the list of child folders under a parent folder. strChildFolders = GetFolderTree(Server.MapPath(strParentFolder)) ' Loop through the collection of folders. For lngChildFolders = 1 to UBound(strChildFolders) strUrlRelative = Replace(Mid(strChildFolders(lngChildFolders),Len(strPhysicalRoot)+1),"\","/") Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative)) ' Loop through the collection of files. For Each objFile in objFolder.Files strExt = objFSO.GetExtensionName(objFile.Name) If StrComp(strExt,"html",vbTextCompare)=0 Then If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat) End If End If Next Next End Sub ' ====================================================================== ' ' Outputs a sitemap URL to the client in XML or TXT format. ' ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never ' tmpStrFormat = TXT|XML ' ' ====================================================================== Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat) On Error Resume Next Dim tmpDate : tmpDate = CDate(tmpLastModified) ' Check if the request is for XML or TXT and return the appropriate syntax. If tmpStrFormat = "XML" Then Response.Write " <url>" & vbCrLf Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf Response.Write " </url>" & vbCrLf Else Response.Write tmpStrUrl & vbCrLf End If End Sub ' ====================================================================== ' ' Returns a string array of folders under a root path ' ' ====================================================================== Function GetFolderTree(strBaseFolder) Dim tmpFolderCount,tmpBaseCount Dim tmpFolders() Dim tmpFSO,tmpFolder,tmpSubFolder ' Define the initial values for the folder counters. tmpFolderCount = 1 tmpBaseCount = 0 ' Dimension an array to hold the folder names. ReDim tmpFolders(1) ' Store the root folder in the array. tmpFolders(tmpFolderCount) = strBaseFolder ' Create file system object. Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject") ' Loop while we still have folders to process. While tmpFolderCount <> tmpBaseCount ' Set up a folder object to a base folder. Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1)) ' Loop through the collection of subfolders for the base folder. For Each tmpSubFolder In tmpFolder.SubFolders ' Increment the folder count. tmpFolderCount = tmpFolderCount + 1 ' Increase the array size ReDim Preserve tmpFolders(tmpFolderCount) ' Store the folder name in the array. tmpFolders(tmpFolderCount) = tmpSubFolder.Path Next ' Increment the base folder counter. tmpBaseCount = tmpBaseCount + 1 Wend GetFolderTree = tmpFolders End Function %>
It should be easily seen that the code is largely unchanged from my previous blog.
One last thing to consider, I didn't make any changes to the Robots.asp file in this blog. But that being said, when you do not want specific paths crawled, you should add rules to your Robots.txt file to disallow those paths. For example, here is a simple Robots.txt file which allows your entire website:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: http://localhost:53644/sitemap.xml # Make changes for all web spiders User-agent: * Allow: / Disallow:
If you were going to deny crawling on certain paths, you would need to add the specific paths that you do not want crawled to your Robots.txt file like the following example:
# Robots.txt # For more information on this file see: # http://www.robotstxt.org/ # Define the sitemap path Sitemap: http://localhost:53644/sitemap.xml # Make changes for all web spiders User-agent: * Disallow: /foo Disallow: /bar
With that being said, if you are using my Robots.asp file from my last blog, you would need to update the section of code that defines the paths like my previous example:
Response.Write "# Make changes for all web spiders" & vbCrLf Response.Write "User-agent: *" & vbCrLf Response.Write "Disallow: /foo" & vbCrLf Response.Write "Disallow: /bar" & vbCrLf
I hope this helps. ;-]
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
17 September 2013 • by Bob • IIS, WebDAV, URL Rewrite
I had an interesting WebDAV question earlier today that I had not considered before: how can someone create a "Blind Drop Share" using WebDAV? By way of explanation, a Blind Drop Share is a path where users can copy files, but never see the files that are in the directory. You can setup something like this by using NTFS permissions, but that environment can be a little difficult to set up and maintain.
With that in mind, I decided to research a WebDAV-specific solution that didn't require mangling my NTFS permissions. In the end it was pretty easy to achieve, so I thought that it would make a good blog for anyone who wants to do this.
NTFS permissions contain access controls that configure the directory-listing behavior for files and folders; if you modify those settings, you can control who can see files and folders when they connect to your shared resources. However, there are no built-in features for the WebDAV module which ships with IIS that will approximate the NTFS behavior. But that being said, there is an interesting WebDAV quirk that you can use that will allow you to restrict directory listings, and I will explain how that works.
WebDAV uses the PROPFIND
command to retrieve the properties for files and folders, and the WebDAV Redirector will use the response from a PROPFIND
command to display a directory listing. (Note: Official WebDAV terminology has no concept of files and folders, those physical objects are respectively called Resources and Collections in WebDAV jargon. But that being said, I will use files and folders throughout this blog post for ease of understanding.)
In any event, one of the HTTP headers that WebDAV uses with the PROPFIND
command is the Depth
header, which is used to specify how deep the folder/collection traversal should go:
PROPFIND
command for the root of your website with a Depth:0
header/value, you would get the properties for just the root directory - with no files listed; a Depth:0
header/value only retrieves properties for the single resource that you requested.PROPFIND
command for the root of your website with a Depth:1
header/value, you would get the properties for every file and folder in the root of your website; a Depth:1
header/value retrieves properties for the resource that you requested and all siblings.PROPFIND
command for the root of your website with a Depth:infinity
header/value, you would get the properties for every file and folder in your entire website; a Depth:infinity
header/value retrieves properties for every resource regardless of its depth in the hierarchy. (Note that retrieving directory listings with infinite depth are disabled by default in IIS 7 and IIS 8 because it can be CPU intensive.)By analyzing the above information, it should be obvious that what you need to do is to restrict users to using a Depth:0
header/value. But that's where this scenario gets interesting: if your end-users are using the Windows WebDAV Redirector or other similar technology to map a drive to your HTTP website, you have no control over the value of the Depth
header. So how can you restrict that?
In the past I would have written custom native-code HTTP module or ISAPI filter to modify the value of the Depth
header; but once you understand how WebDAV works, you can use the URL Rewrite module to modify the headers of incoming HTTP requests to accomplish some pretty cool things - like modifying the values WebDAV-related HTTP headers.
Here's how I configured URL Rewrite to set the value of the Depth
header to 0
, which allowed me to create a "Blind Drop" WebDAV site:
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
![]() |
Click image to expand |
![]() |
Click image to expand |
![]() |
Click image to expand |
If all of these changes were saved to your applicationHost.config file, the resulting XML might resemble the following example - with XML comments added by me to highlight some of the major sections:
<location path="Default Web Site"> <system.webServer> <-- Start of Security Settings --> <security> <authentication> <anonymousAuthentication enabled="false" /> <basicAuthentication enabled="true" /> </authentication> <requestFiltering> <fileExtensions applyToWebDAV="false" /> <verbs applyToWebDAV="false" /> <hiddenSegments applyToWebDAV="false" /> </requestFiltering> </security> <-- Start of WebDAV Settings --> <webdav> <authoringRules> <add roles="administrators" path="*" access="Read, Write, Source" /> </authoringRules> <authoring enabled="true"> <properties allowAnonymousPropfind="false" allowInfinitePropfindDepth="true"> <clear /> <add xmlNamespace="*" propertyStore="webdav_simple_prop" /> </properties> </authoring> </webdav> <-- Start of URL Rewrite Settings --> <rewrite> <rules> <rule name="Modify Depth Header" enabled="true" patternSyntax="Wildcard"> <match url="*" /> <serverVariables> <set name="HTTP_DEPTH" value="0" /> </serverVariables> <action type="None" /> </rule> </rules> <allowedServerVariables> <add name="HTTP_DEPTH" /> </allowedServerVariables> </rewrite> </system.webServer> </location>
In all likelihood, some of these settings will be stored in your applicationHost.config file, and the remaining settings will be stored in the web.config file of your website.
If you did not have the URL Rewrite rule in place, or if you disabled the rule, then your web server might respond like the following example if you used the WebDAV Redirector to map a drive to your website from a command prompt:
C:\>net use z: http://www.contoso.com/ Enter the user name for 'www.contoso.com': www.contoso.com\robert Enter the password for www.contoso.com: The command completed successfully. C:\>z: Z:\>dir Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\ 09/16/2013 08:55 PM <DIR> . 09/16/2013 08:55 PM <DIR> .. 09/14/2013 12:39 AM <DIR> aspnet_client 09/16/2013 08:06 PM <DIR> scripts 09/16/2013 07:55 PM 66 default.aspx 09/14/2013 12:38 AM 98,757 iis-85.png 09/14/2013 12:38 AM 694 iisstart.htm 09/16/2013 08:55 PM 75 web.config 4 File(s) 99,592 bytes 8 Dir(s) 956,202,631,168 bytes free Z:\>
However, when you have the URL Rewrite correctly configured and enabled, connecting to the same website will resemble the following example - notice how no files or folders are listed:
C:\>net use z: http://www.contoso.com/ Enter the user name for 'www.contoso.com': www.contoso.com\robert Enter the password for www.contoso.com: The command completed successfully. C:\>z: Z:\>dir Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\ 09/16/2013 08:55 PM <DIR> . 09/16/2013 08:55 PM <DIR> .. 0 File(s) 0 bytes 2 Dir(s) 956,202,803,200 bytes free Z:\>
Despite the blank directory listing, you can still retrieve the properties for any file or folder if you know that it exists. So if you were to use the mapped drive from the preceding example, you could still use an explicit directory command for any object that you had uploaded or created:
Z:\>dir default.aspx Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\ 09/16/2013 07:55 PM 66 default.aspx 1 File(s) 66 bytes 0 Dir(s) 956,202,799,104 bytes free Z:\>dir scripts Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\scripts 09/16/2013 07:52 PM <DIR> . 09/16/2013 07:52 PM <DIR> .. 0 File(s) 0 bytes 2 Dir(s) 956,202,799,104 bytes free Z:\>
The same is true for creating directories and files; you can create them, but they will not show up in the directory listings after you have created them unless you reference them explicitly:
Z:\>md foobar Z:\>dir Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\ 09/16/2013 11:52 PM <DIR> . 09/16/2013 11:52 PM <DIR> .. 0 File(s) 0 bytes 2 Dir(s) 956,202,618,880 bytes free Z:\>cd foobar Z:\foobar>copy NUL foobar.txt 1 file(s) copied. Z:\foobar>dir Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\foobar 09/16/2013 11:52 PM <DIR> . 09/16/2013 11:52 PM <DIR> .. 0 File(s) 0 bytes 2 Dir(s) 956,202,303,488 bytes free Z:\foobar>dir foobar.txt Volume in drive Z has no label. Volume Serial Number is 0000-0000 Directory of Z:\foobar 09/16/2013 11:53 PM 0 foobar.txt 1 File(s) 0 bytes 0 Dir(s) 956,202,299,392 bytes free Z:\foobar>
That wraps it up for today's post, although I should point out that if you see any errors when you are using the WebDAV Redirector, you should take a look at the Troubleshooting the WebDAV Redirector section of my Using the WebDAV Redirector article; I have done my best to list every error and resolution that I have discovered over the past several years.
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
31 December 2012 • by Bob • IIS, URL Rewrite, SEO, Classic ASP
I had another interesting situation present itself recently that I thought would make a good blog: how to use Classic ASP with the IIS URL Rewrite module to dynamically generate Robots.txt and Sitemap.xml files.
Here's the situation: I host a website for one of my family members, and like everyone else on the Internet, he wanted some better SEO rankings. We discussed a few things that he could do to improve his visibility with search engines, and one of the suggestions that I gave him was to keep his Robots.txt and Sitemap.xml files up-to-date. But there was an additional caveat - he uses two separate DNS names for the same website, and that presents a problem for absolute URLs in either of those files. Before anyone points out that it's usually not a good idea to host multiple DNS names on the same content, there are times when this is acceptable; for example, if you are trying to decide which of several DNS names is the best to use, you might want to bind each name to the same IP address and parse your logs to find out which address is getting the most traffic.
In any event, the syntax for both Robots.txt and Sitemap.xml files is pretty easy, so I wrote a couple of simple Classic ASP Robots.asp and Sitemap.asp pages that output the correct syntax and DNS-specific URLs for each domain name, and I wrote some simple URL Rewrite rules that rewrite inbound requests for Robots.txt and Sitemap.xml files to the ASP pages, while blocking direct access to the Classic ASP pages themselves.
All of that being said, there are a couple of quick things that I would like to mention before I get to the code:
That being said, let's move on to the actual code.
There are three files that you will need to create for this example:
You need to save the following code sample as Robots.asp in the root of your website; this page will be executed whenever someone requests the Robots.txt file for your website. This example is very simple: it checks for the requested hostname and uses that to dynamically create the absolute URL for the website's Sitemap.xml file.
<% Option Explicit On Error Resume Next Dim strUrlRoot Dim strHttpHost Dim strUserAgent Response.Clear Response.Buffer = True Response.ContentType = "text/plain" Response.CacheControl = "public" Response.Write "# Robots.txt" & vbCrLf Response.Write "# For more information on this file see:" & vbCrLf Response.Write "# http://www.robotstxt.org/" & vbCrLf & vbCrLf strHttpHost = LCase(Request.ServerVariables("HTTP_HOST")) strUserAgent = LCase(Request.ServerVariables("HTTP_USER_AGENT")) strUrlRoot = "http://" & strHttpHost Response.Write "# Define the sitemap path" & vbCrLf Response.Write "Sitemap: " & strUrlRoot & "/sitemap.xml" & vbCrLf & vbCrLf Response.Write "# Make changes for all web spiders" & vbCrLf Response.Write "User-agent: *" & vbCrLf Response.Write "Allow: /" & vbCrLf Response.Write "Disallow: " & vbCrLf Response.End %>
The following example file is also pretty simple, and you would save this code as Sitemap.asp in the root of your website. There is a section in the code where it loops through the file system looking for files with the *.html file extension and only creates URLs for those files. If you want other files included in your results, or you want to change the code from static to dynamic content, this is where you would need to update the file accordingly.
<% Option Explicit On Error Resume Next Response.Clear Response.Buffer = True Response.AddHeader "Connection", "Keep-Alive" Response.CacheControl = "public" Dim strFolderArray, lngFolderArray Dim strUrlRoot, strPhysicalRoot, strFormat Dim strUrlRelative, strExt Dim objFSO, objFolder, objFile strPhysicalRoot = Server.MapPath("/") Set objFSO = Server.CreateObject("Scripting.Filesystemobject") strUrlRoot = "http://" & Request.ServerVariables("HTTP_HOST") ' Check for XML or TXT format. If UCase(Trim(Request("format")))="XML" Then strFormat = "XML" Response.ContentType = "text/xml" Else strFormat = "TXT" Response.ContentType = "text/plain" End If ' Add the UTF-8 Byte Order Mark. Response.Write Chr(CByte("&hEF")) Response.Write Chr(CByte("&hBB")) Response.Write Chr(CByte("&hBF")) If strFormat = "XML" Then Response.Write "<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf Response.Write "<urlset xmlns=""http://www.sitemaps.org/schemas/sitemap/0.9"">" & vbCrLf End if ' Always output the root of the website. Call WriteUrl(strUrlRoot,Now,"weekly",strFormat) ' -------------------------------------------------- ' This following section contains the logic to parse ' the directory tree and return URLs based on the ' static *.html files that it locates. This is where ' you would change the code for dynamic content. ' -------------------------------------------------- strFolderArray = GetFolderTree(strPhysicalRoot) For lngFolderArray = 1 to UBound(strFolderArray) strUrlRelative = Replace(Mid(strFolderArray(lngFolderArray),Len(strPhysicalRoot)+1),"\","/") Set objFolder = objFSO.GetFolder(Server.MapPath("." & strUrlRelative)) For Each objFile in objFolder.Files strExt = objFSO.GetExtensionName(objFile.Name) If StrComp(strExt,"html",vbTextCompare)=0 Then If StrComp(Left(objFile.Name,6),"google",vbTextCompare)<>0 Then Call WriteUrl(strUrlRoot & strUrlRelative & "/" & objFile.Name, objFile.DateLastModified, "weekly", strFormat) End If End If Next Next ' -------------------------------------------------- ' End of file system loop. ' -------------------------------------------------- If strFormat = "XML" Then Response.Write "</urlset>" End If Response.End ' ====================================================================== ' ' Outputs a sitemap URL to the client in XML or TXT format. ' ' tmpStrFreq = always|hourly|daily|weekly|monthly|yearly|never ' tmpStrFormat = TXT|XML ' ' ====================================================================== Sub WriteUrl(tmpStrUrl,tmpLastModified,tmpStrFreq,tmpStrFormat) On Error Resume Next Dim tmpDate : tmpDate = CDate(tmpLastModified) ' Check if the request is for XML or TXT and return the appropriate syntax. If tmpStrFormat = "XML" Then Response.Write " <url>" & vbCrLf Response.Write " <loc>" & Server.HtmlEncode(tmpStrUrl) & "</loc>" & vbCrLf Response.Write " <lastmod>" & Year(tmpLastModified) & "-" & Right("0" & Month(tmpLastModified),2) & "-" & Right("0" & Day(tmpLastModified),2) & "</lastmod>" & vbCrLf Response.Write " <changefreq>" & tmpStrFreq & "</changefreq>" & vbCrLf Response.Write " </url>" & vbCrLf Else Response.Write tmpStrUrl & vbCrLf End If End Sub ' ====================================================================== ' ' Returns a string array of folders under a root path ' ' ====================================================================== Function GetFolderTree(strBaseFolder) Dim tmpFolderCount,tmpBaseCount Dim tmpFolders() Dim tmpFSO,tmpFolder,tmpSubFolder ' Define the initial values for the folder counters. tmpFolderCount = 1 tmpBaseCount = 0 ' Dimension an array to hold the folder names. ReDim tmpFolders(1) ' Store the root folder in the array. tmpFolders(tmpFolderCount) = strBaseFolder ' Create file system object. Set tmpFSO = Server.CreateObject("Scripting.Filesystemobject") ' Loop while we still have folders to process. While tmpFolderCount <> tmpBaseCount ' Set up a folder object to a base folder. Set tmpFolder = tmpFSO.GetFolder(tmpFolders(tmpBaseCount+1)) ' Loop through the collection of subfolders for the base folder. For Each tmpSubFolder In tmpFolder.SubFolders ' Increment the folder count. tmpFolderCount = tmpFolderCount + 1 ' Increase the array size ReDim Preserve tmpFolders(tmpFolderCount) ' Store the folder name in the array. tmpFolders(tmpFolderCount) = tmpSubFolder.Path Next ' Increment the base folder counter. tmpBaseCount = tmpBaseCount + 1 Wend GetFolderTree = tmpFolders End Function %>
Note: There are two helper methods in the preceding example that I should call out:
The last step is to add the URL Rewrite rules to the Web.config file in the root of your website. The following example is a complete Web.config file, but you could merge the rules into your existing Web.config file if you have already created one for your website. These rules are pretty simple, they rewrite all inbound requests for Robots.txt to Robots.asp, and they rewrite all requests for Sitemap.xml to Sitemap.asp?format=XML and requests for Sitemap.txt to Sitemap.asp?format=TXT; this allows requests for both the XML-based and text-based sitemaps to work, even though the Robots.txt file contains the path to the XML file. The last part of the URL Rewrite syntax returns HTTP 404 errors if anyone tries to send direct requests for either the Robots.asp or Sitemap.asp files; this isn't absolutely necesary, but I like to mask what I'm doing from prying eyes. (I'm kind of geeky that way.)
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rewriteMaps> <clear /> <rewriteMap name="Static URL Rewrites"> <add key="/robots.txt" value="/robots.asp" /> <add key="/sitemap.xml" value="/sitemap.asp?format=XML" /> <add key="/sitemap.txt" value="/sitemap.asp?format=TXT" /> </rewriteMap> <rewriteMap name="Static URL Failures"> <add key="/robots.asp" value="/" /> <add key="/sitemap.asp" value="/" /> </rewriteMap> </rewriteMaps> <rules> <clear /> <rule name="Static URL Rewrites" patternSyntax="ECMAScript" stopProcessing="true"> <match url=".*" ignoreCase="true" negate="false" /> <conditions> <add input="{Static URL Rewrites:{REQUEST_URI}}" pattern="(.+)" /> </conditions> <action type="Rewrite" url="{C:1}" appendQueryString="false" redirectType="Temporary" /> </rule> <rule name="Static URL Failures" patternSyntax="ECMAScript" stopProcessing="true"> <match url=".*" ignoreCase="true" negate="false" /> <conditions> <add input="{Static URL Failures:{REQUEST_URI}}" pattern="(.+)" /> </conditions> <action type="CustomResponse" statusCode="404" subStatusCode="0" /> </rule> <rule name="Prevent rewriting for static files" patternSyntax="Wildcard" stopProcessing="true"> <match url="*" /> <conditions> <add input="{REQUEST_FILENAME}" matchType="IsFile" /> </conditions> <action type="None" /> </rule> </rules> </rewrite> </system.webServer> </configuration>
That sums it up for this blog; I hope that you get some good ideas from it.
For more information about the syntax in Robots.txt and Sitemap.xml files, see the following URLs:
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/
28 June 2012 • by Bob • IIS, URL Rewrite
One of the applications that I like to use on my websites it the Quick Digital Image Gallery (QDIG), which is a simple PHP-based image gallery that has just enough features to be really useful without a lot of work on my part to get it working. (Simple is always better - ;-].) Here's a screenshot of QDIG in action with some Bing photos:
The trouble is, QDIG creates some really heinous query string lines; see the URL line in the following screenshot for an example:
I don't know about you, but in today's SEO-friendly world, I hate long and convoluted query strings. Which brings me to one of my favorite subjects: URL Rewrite for IIS
If you've been around IIS for a while, you probably already know that there are a lot of great things that you can do with the IIS URL Rewrite module, and one of the things that URL Rewrite is great at is cleaning up complex query strings into something that's a little more intuitive.
It would take way to long to describe all of the steps to create the following rules with the URL Rewrite interface, so I'll just include the contents of my web.config file for my QDIG directory - which is a physical folder called "QDIG" that is under the root of my website:
<?xml version="1.0" encoding="UTF-8"?> <configuration> <system.webServer> <rewrite> <rules> <!-- Rewrite the inbound URLs into the correct query string. --> <rule name="RewriteInboundQdigURLs" stopProcessing="true"> <match url="Qif/(.*)/Qiv/(.*)/Qis/(.*)/Qwd/(.*)" /> <conditions> <add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" /> <add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" /> </conditions> <action type="Rewrite" url="/QDIG/?Qif={R:1}&Qiv={R:2}&Qis={R:3}&Qwd={R:4}" appendQueryString="false" /> </rule> </rules> <outboundRules> <!-- Rewrite the outbound URLs into user-friendly URLs. --> <rule name="RewriteOutboundQdigURLs" preCondition="ResponseIsHTML" enabled="true"> <match filterByTags="A, Img, Link" pattern="^(.*)\?Qwd=([^=&]+)&(?:amp;)?Qif=([^=&]+)&(?:amp;)?Qiv=([^=&]+)&(?:amp;)?Qis=([^=&]+)(.*)" /> <action type="Rewrite" value="/QDIG/Qif/{R:3}/Qiv/{R:4}/Qis/{R:5}/Qwd/{R:2}" /> </rule> <!-- Rewrite the outbound relative QDIG URLs for the correct path. --> <rule name="RewriteOutboundRelativeQdigFileURLs" preCondition="ResponseIsHTML" enabled="true"> <match filterByTags="Img" pattern="^\.\/qdig-files/(.*)$" /> <action type="Rewrite" value="/QDIG/qdig-files/{R:1}" /> </rule> <!-- Rewrite the outbound relative file URLs for the correct path. --> <rule name="RewriteOutboundRelativeFileURLs" preCondition="ResponseIsHTML" enabled="true"> <match filterByTags="Img" pattern="^\.\/(.*)$" /> <action type="Rewrite" value="/QDIG/{R:1}" /> </rule> <preConditions> <!-- Define a precondition so the outbound rules only apply to HTML responses. --> <preCondition name="ResponseIsHTML"> <add input="{RESPONSE_CONTENT_TYPE}" pattern="^text/html" /> </preCondition> </preConditions> </outboundRules> </rewrite> </system.webServer> </configuration>
Here's the breakdown of what all of the rules do:
Once you have these rules in place, you get nice user-friendly URLs in QDIG:
I should also point out that these rules also support changing the style from thumbnails to file names to file numbers, etc.
All of that being said, there is one thing that these rules do not support - and that's nested folders under my QDIG application. I don't like to use folders under my QDIG folder - I like to use separate folders with the QDIG file in it, because this makes each gallery self-contained and easily transportable. That being said, after I had written the text for this blog, I tried to use a subfolder under my QDIG application and that didn't work. By looking at what was going on, I'm pretty sure that it would be pretty trivial to write some URL Rewrite rules that would accommodate using subfolders, but that's another project for another day. ;-]
Note: This blog was originally posted at http://blogs.msdn.com/robert_mcmurray/